How to retrive in this scenario?

krishnadhakal · May 29, 2025, 2:16am

I have daily news source articles scrapped from Custom search engine, for my Knowledge Base. I have more than 60 topics that I use to perform search on.

In my local chatbot app I am asking question like: “What is the latest on Intel?”

What is the best retriver to use in this kind of scenario?

Normal semantic search donot give relevent chunks in top_k.

aaraya · May 29, 2025, 4:10pm

Sounds like you’re building a personal RAG system and looking for the best retriever for a knowledge base filled with scraped news articles. I actually worked on a similar project not long ago, and we wrote a blog about it that might help you out. It covers how we built the system, the improvements we made, and the results we got (pay attention to the sections Contextualized Chunks and Information Retrieval techniques):
https://www.ridgerun.ai/post/on-premise-retrieval-augmented-generation-system-how-we-designed-and-implemented-a-rag-for-ridgerun

In our case, we started with standard semantic search, but it wasn’t enough to get the most relevant chunks at the top. What really helped was adding a re-ranking step using ColBERT, plus tuning how we chunked the documents.

Also, just in case you haven’t already done this, I’d suggest reviewing how you generate your chunks. Scraping often brings in a lot of noise or extra characters that can mess up the embeddings and hurt retrieval quality.

There’s no one-size-fits-all retriever, so depending on how your data is structured, some methods might work better than others.

Hope this helps, and feel free to reach out if you want to dig into the details.

Adrian Araya
Machine Learning Engineer at RidgeRun.ai
Contact us: support@ridgerun.ai

Topic		Replies	Views
Using RAG with local documents Models	3	3658	April 21, 2021
Seeking Advice on Processing Support Conversations for Efficient RAG Model Search Intermediate	0	49	September 9, 2024
Create your own search engine 🤗 Course Projects	11	4731	December 29, 2024
Determining When to Search or Refine Answers in a RAG System Using Previous Context Beginners	0	184	November 18, 2024
Language model to search an answer in a huge collection of (unrelated) paragraphs Research	4	1509	July 6, 2021

How to retrive in this scenario?

Related topics